Particle Value Functions

نویسندگان

  • Chris J. Maddison
  • Dieterich Lawson
  • George Tucker
  • Nicolas Heess
  • Arnaud Doucet
  • Andriy Mnih
  • Yee Whye Teh
چکیده

The policy gradients of the expected return objective can react slowly to rare rewards. Yet, in some cases agents may wish to emphasize the low or high returns regardless of their probability. Borrowing from the economics and control literature, we review the risk-sensitive value function that arises from an exponential utility and illustrate its effects on an example. This risk-sensitive value function is not always applicable to reinforcement learning problems, so we introduce the particle value function defined by a particle filter over the distributions of an agent’s experience, which bounds the risk-sensitive one. We illustrate the benefit of the policy gradients of this objective in Cliffworld.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Particle swarm optimization for a bi-objective web-based convergent product networks

Here, a collection of base functions and sub-functions configure the nodes of a web-based (digital)network representing functionalities. Each arc in the network is to be assigned as the link between two nodes. The aim is to find an optimal tree of functionalities in the network adding value to the product in the web environment. First, a purification process is performed in the product network ...

متن کامل

Optimum allocation of Iranian oil and gas resources using multi-objective linear programming and particle swarm optimization in resistive economy conditions

This research presents a model for optimal allocation of Iranian oil and gas resources in sanction condition based on stochastic linear multi-objective programming. The general policies of the resistive economy include expanding exports of gas, electricity, petrochemical and petroleum products, expanding the strategic oil and gas reserves, increasing added value through completing the petroleum...

متن کامل

An Interactive Fuzzy Satisfying Method Based on Particle Swarm Optimization for Multi-Objective Function in Reactive Power Market

Reactive power plays an important role in supporting real power transmission, maintaining system voltages within proper limits and overall system reliability. In this paper, the production cost of reactive power, cost of the system transmission loss, investment cost of capacitor banks and absolute value of total voltage deviation (TVD) are included into the objective function of the power flow ...

متن کامل

A self-guided Particle Swarm Optimization with Independent Dynamic Inertia Weights Setting on Each Particle

In the standard PSO algorithm, each particle in swarm has the same inertia weight settings and its values decrease from generation to generation, which can induce the decreasing of population diversity. As a result, it may fall into the local optimum. Besides, the decreasing of weights values is restricted by the maximum evolutionary generation, which has an influence on the convergence speed a...

متن کامل

Reproducing Polynomial(Singularity) Particle Methods and Adaptive Meshless Methods for 2-Dim Elliptic Boundary Value Problems

Oh et al ([25]) introduced the reproducing polynomial particle (RPP) shape functions that are piecewise polynomial and satisfy the Kronecker delta property. In this paper, we introduce RPPM (Reproducing Polynomial Particle Methods) that is the Galerkin approximation method associated with the use of the RPP approximation space. Planting particles in the computation domain in a patchwise uniform...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1703.05820  شماره 

صفحات  -

تاریخ انتشار 2017